This script describes analysis on the ensemble of models generated by ReplicaExchange macro. See cluster analysis tutorial for the first part of the script and how to run it.
In [ ]:
import IMP
import IMP.pmi
import IMP.pmi.macros
In [ ]:
is_mpi=True #run in parallel and requires mpi4py
model=IMP.Model()
First, construct the analysis class by indicating where the files are and whether you want to merge different runs together:
merge_directories
is a list of directory names containing the different runs you want to merge together in the same analysis.
rmf_dir
is the name of the directory where to find the rmfs
global_output_directory
if the name of the directory where to find the rmfs and the stat files within the merge_directories
stat_file_name_suffix
is the suffix name for the stat files in the directory(ies) /merge_directories/global_output_directory/
In [ ]:
mc=IMP.pmi.macros.AnalysisReplicaExchange0(model,
stat_file_name_suffix="stat",
merge_directories=["replica_exchange_directory_1","replica_exchange_directory_2"],
global_output_directory="/output/",
rmf_dir="rmfs/")
Second, list the features you want to extract from the stat files
In [ ]:
feature_list=["ISDCrossLinkMS_Distance_intrarb",
"ISDCrossLinkMS_Distance_interrb",
"ISDCrossLinkMS_Data_Score",
"GaussianEMRestraint_None",
"SimplifiedModel_Linker_Score_None",
"ISDCrossLinkMS_Psi_1.0_",
"ISDCrossLinkMS_Sigma_1_"]
Last, setup the clustering.
skip_clustering=True
it will actually not perform the clustering at all, but it will exctract the best scoring models
prefiltervalue=300
this value indicates the upper-bound score value. It is needed for memory efficiency.
number_of_best_scoring_models
number of best scoring models to be extracted
first_and_last_frames=[0,0.5]
what portion of the trajectory you want to perform the analysis. For example, [0,0.5] means that you want to extract the best scoring models
from the first half of the ensemble. [0.5,1.0] means from the second half.
In [ ]:
mc.clustering("SimplifiedModel_Total_Score_None",
"rmf_file",
"rmf_frame_index",
prefiltervalue=300,
number_of_best_scoring_models=100,
skip_clustering=True,
feature_keys=feature_list,
first_and_last_frames=[0,0.5],
is_mpi=is_mpi,
get_every=1)